Statistical Identification of Pleonastic Pronouns
نویسندگان
چکیده
This paper describes an algorithm to identify pleonastic pronouns using statistical techniques. The training step uses a coreference annotated corpus of English and focuses on a set of pronouns such as it. As far as we know, there is no corpus with a pleonastic annotation. The main idea of the algorithm was then to recast the definition of pleonastic pronouns as pronouns that never occur in a coreference chain. We integrated this algorithm in an existing coreference solver (Björkelund and Nugues, 2011) and we measured the overall performance gains brought by the pleonastic it removal. We observed an improvement of 0.42 from 59.15 of the CoNLL score. The complete system (Stamborg et al., 2012) participated in the CoNLL 2012 shared task (Pradhan et al., 2012), where it obtained the 4th rank.
منابع مشابه
Instance Sampling for Identification of Arabic Pleonastic Pronouns M. Abdul-Mageed 1 Instance Sampling for Automatic Identification of Arabic Pleonastic Pronouns
The term anaphora describes backward reference to items previously occurring in a text (see e.g., Mitkov, 2002). The pointing back item is called an anaphor and the item to which it refers is called its antecedent. The identification of an anaphor’s antecedent is termed anaphora resolution and is considered one of the most difficult tasks in natural language processing (NLP) since it relies on ...
متن کاملIdentification of Pleonastic It Using the Web
In a significant minority of cases, certain pronouns, especially the pronoun it, can be used without referring to any specific entity. This phenomenon of pleonastic pronoun usage poses serious problems for systems aiming at even a shallow understanding of natural language texts. In this paper, a novel approach is proposed to identify such uses of it : the extrapositional cases are identified us...
متن کاملPronouns Without Explicit Antecedents: How do We Know When a Pronoun is Referential?
Pronouns without explicit noun phrase antecedents pose a problem for any theory of reference resolution. We report here on an empirical study of such pronouns in the Santa Barbara Corpus of Spoken American English, a corpus of spontaneous, casual conversation. Analysis of 2,046 third person personal pronouns in fourteen transcripts indicates that 330 (or 16.1%) lack NP antecedents. These pronou...
متن کاملA Modular Architecture for Anaphora Resolution
Anaphora resolution attempts to determine the correct antecedent of an anaphor (the term pointing back). In what follows, we propose an algorithm for the resolution of anaphoric pronouns that relies on lexical and syntactic knowledge incorporated in a modular approach based on constraints and preferences. Our objective was to find the correct antecedent to the following subject pronouns (il, il...
متن کاملAnnotating dropped pronouns in Chinese newswire text
We propose an annotation framework to explicitly identify dropped subject pronouns in Chinese. We acknowledge and specify 10 concrete pronouns that exist as words in Chinese and 4 abstract pronouns that do not correspond to Chinese words, but that are recognized conceptually, to native Chinese speakers. These abstract pronouns are identified as “unspecified”, “pleonastic”, “event”, and “existen...
متن کامل